# What is the Active Prevalence of COVID-19?

**By** Mu-Jeung Yang, Marinho Bertanha, Nathan Seegert, Maclean Gaulin, Adam Looney, Brian Orleans, Andrew T. Pavia, Kristina Stratford, Matthew Samore, Steven Alder

Code repo to recreate the figures and tables in "What is the Active Prevalence of COVID-19?"

## Data

Our primary data are publicly available from covidtracking.com and the Census Bureau.
Our testing data used to calibrate our model contain sensitive PII, thus are not available for distribution.

## Instructions

Code can generally be run in numerical order presented in filenames. All but one are stata files, run using Stata 17 (but should be generally compatible with other versions):

  1. `1.0_load_data.do` is run by other files, not individually.
  2. `1.1_cache-load_lasso_data.do` is used to create the dataset for lasso regressions, which use interactions. This file makes those interaction variables, and names them appropriately to be used in loops and with Stata's `*` notation.
  3. `2.1_cache_bootstrap_results.do` caches the CIs from our SE bootstrap procedure, because it takes a long time to run. Caches bootstrap results to `./output/bootstrap/`.
  4. `3.0_table_1.do` creates summary statistics and `tex` variables to be used in the paper.
  5. `3.1_table_2.do` creates table 2, which uses bootstrap SEs, so `2.1_cache_bootstrap_results.do` should have been run first. Also saves off data to a temporary file for use in making figures below.
  6. `3.2_table_3.do` makes the state estimates in table 3.
  7. `4.0_figure_1.ipynb` uses python to generate Figure 1.
  8. `4.1_figure_2.do` makes both panels of figure 2, using the cached file from `3.1_table_2.do`.
  9. `5.0_appendix_c_table_1.do` makes Table 1 in Appendix C.
  9. `5.1_appendix_c_table_2.do` makes Table 2 in Appendix C.
  10. `6.0_appendix_b_figure_3.do` makes figure 3 in Appendix B.

To run, extract this repo to `~/Desktop/RESTAT_CODE` and execute the files in Stata or Python as per above.
